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Executive Summary 


DevOps is a trend towards a tighter integration between development (IDevh and operations ( |QpsP 
teams. The need for such an integration is driven by the requirement to continuously adapt 
enterprise applications (jEAb ) to changes in the business environment. As of today, DevOps 
concepts have been primarily introduced to ensure a constant flow of features and bug fixes into 
new releases from a functional perspective. In order to integrate a non-functional perspective 
into these DevOps concepts this report focuses on tools, activities, and processes to ensure one 
of the most important quality attributes of a software system, namely performance. 

Performance describes system properties concerning its timeliness and use of resources. Com¬ 
mon metrics are response time, throughput, and resource utilization. Performance goals for IEAb 
are typically dehned by setting upper and/or lower bounds for these metrics and specific busi¬ 
ness transactions. In order to ensure that such performance goals can be met, several activities 
are required during development and operation of these systems as well as during the transition 
from IDevI to |Ops Activities during development are typically summarized by the term Software 
Performance Engineering (ISPEp . whereas activities during operations are called Application 
Performance Management (lAPMD . ISPEI and lAPMI were historically tackled independently from 
each other, but the newly emerging DevOps concepts require and enable a tighter integration 
between both activity streams. This report presents existing solutions to support this integration 
as well as open research challenges in this area. 

The report starts by defining lEAb and summarizes their characteristics that make perfor¬ 
mance evaluations for these systems particularly challenging. It continues by describing our 
understanding of DevOps and explaining the roots of this trend to set the context for the re¬ 
maining parts of the report. Afterwards, performance management activities that are common 
in both life cycle phases are explained, until the particularities of ISPEI and lAPMl are discussed in 
separate sections. Einally, the report concludes by outlining activities and challenges to support 
the rapid iteration between IDevI and Ops 
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Section 1. Introduction 


1 Introduction 


DevOps has emerged in recent years to enable faster release cycles for complex Information 
Technology dni) services. DevOps is a set of principles and practices for smoothing out the 
gap between development and operations in order to continuously deploy stable versions of an 


application system (Hiittermann, 2012). Activities in both of these application life cycle phases 


often pursue opposing goals. On the one hand, operations ( |Ops[ ) teams want to keep the system 
stable and favor fewer changes to the system. On the other hand, development (jPevp teams try 
to build and deploy changes to an application system frequently. DevOps therefore aims at a 
better integration of all activities in software development and operation of an application system 
life cycle outlined in [Figure iTTj This liaison reduces dispute and fosters consensus between the 


conflicting goals of DevOps. 

Automation in build, deployment, and monitoring processes are key success factors for a suc¬ 
cessful implementation of the DevOps concept. Technologies and methods used to support the 
DevOps concept include infrastructure as code, automation through deep modeling of systems, 


continuous deployment, and continuous integration (Kim et ah, 2014) 


This report focuses on performance-relevant aspects of DevOps concepts. The coordination 
and execution of all activities necessary to achieve performance goals during system development 


are condensed as Software Performance Engineering (ISPEI) (Woodside et al., 2007). Correspond¬ 


ing activities during operations are referred to as Application Performance Management (lAPMD 
(Menasce, 2004). Recent approaches integrate these two activities and consider performance 
management as a comprehensive assignment (Brunnert et ah, 2014a). A holistic performance 


management supports DevOps by integrating performance-relevant information. 

The report summarizes basic concepts of performance management for DevOps and use cases 



Figure 1.1: Performance evaluation in the application life cycle (Standard Performance Evalua¬ 


tion Corporation, 2015) 
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Section 2. Context 


for all phases of an application life cycle. It focnses on technologies and methods in the context 
of performance management that drive the integration of IDevI and [Ops The report aims on 
informing and educating DevOps-interested engineers, developers, architects, and managers as 
well as all readers that are interested in DevOps performance management in general. In each 
of the sections representative solntions are outlined. However, we do not claim that all available 
are covered and are happy to hear about any solution that we have missed (find our contact 
details on http: //research. spec. org/devopswg)—especially, if they address some of the open 
challenges covered in this report. 


After introducing the context of the report in Section 2, the remaining structure is aligned 
to the different phases of an application life cycle. The underlying functionalities and activi¬ 
ties to measure and predict the performance of application systems are explained in [Section 3} 


Section 4 outlines performance management of DevOps activities in the development phase of 


an application system. [Section "5 presents performance management DevOps activities in the 


operations phase. Section 6 describes how performance management can assist and improve the 
evolution of application systems after the initial roll-out. The report concludes with a summary 
and highlights challenges for further DevOps integration and performance management. 


2 Context 


This section provides information about the general context of this technical report. Section 2.1 


will highlight the specific characteristics of enterprise applications from a performance perspec¬ 


tive. Furthermore, in Section 2.2 we will outline the changes driven by the DevOps movement 
that make a new view on performance management necessary. 


2.1 Enterprise Application Performance 

The whole technical report focnses on a specific type of software systems, namely enterprise 
applications (lEAt l . This term is used to distinguish our perspective from other domains such as 
embedded systems or work in the field of high performance computing. lEAt support business 
processes of corporations. This means that they may perform some tasks within a bnsiness 
process automatically, but are used by end-users at some point. Therefore, they often contain 
some parts that process data automatically and other parts that exhibit a user interface m 
and require interactions from humans. In case of lEAf these humans can be employees, partners, 
or customers. 


According to Grinshpan (2012), performance-relevant characteristics of lEAb include the fol¬ 


lowing. lEAfe are vital for corporate business functions, especially the performance of such systems 
is critical for the execution of business tasks. These systems need to be adapted continuously 
to an ever-changing environment and need customization in order to adjust to the unique op¬ 
erational practices of an organization. Their architecture represents server farms with users 
distributed geographically in nnmerous offices. lEAb are accessed using a variety of front-end 
programs and must be able to handle pacing workload intensities. 

Even though we agree with the view on the performance characteristics of lEAb as outlined by 


Grinshpan (2012), we left some of his points out on pnrpose. A main difference in our viewpoint 


is that lEAb may expose [mt for customers as websites in the Internet such as in e-commerce 


companies like Amazon. This perspective is a bit different to the perspective of Grinshpan (2012) 


as he limits the user amount of lEAb to a controllable number of employees or partners. Internet¬ 
facing websites may be nsed by an unpredictable number of customers. This characteristic poses 
specific challenges for capacity planning and management activities. 

Performance of lEAb is described by the metrics response time, throughput, and resource 
utilization. Therefore, performance goals are typically defined by setting upper and/or lower 
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Section 3. Performance Management Activities 


bounds for these metrics and specific business transactions. In order to ensure that such per¬ 
formance goals can be met, several activities are required during development and operation of 
these systems as well as during the transition from Dev to Ops. 

2.2 DevOps 

DevOps indicates an ongoing trend towards a tighter integration between development (jPevp 


and operations (|OpsP teams within or across organizations (Kim et al., 2014). According to Kim 


et al. (2014) the term DevOps was initially coined by Debios and Shafer in 2008 and became 
widespread used after a Flickr presentation in 200(0 The goal of the IDevI and Ops integration is 


to enable UTl organizations to react more flexibly to changes in the business environment (Sharma 


and Coyne, 2015). As outlined in the previous section. FEAb support or enable business processes. 


Therefore, any change in the business environment often leads to changing requirements for an 
lEAl This constant flow of changes is not well supported by release cycles of months or years. 
Therefore, a key goal of the DevOps movement is to allow for a more frequent roll-out of new 
features and bug fixes in a matter of minutes, hours, or days. 

This change can be supported by organizational and technical means. From an organizational 
perspective, the tighter integration of IDevI and |Ops| teams can be realized by restructuring an 
organization. This can, for example, be achieved by setting up mixed IDevI and Ops teams 
for single lEAfe that have end-to-end responsibility for the development and roll-out of an EA. 
Another example would be to set integrated (agile) processes in place that force IDevI and |Ops| 
teams to work closer together. From a technical perspective this integration can be supported 
by automating as many routine tasks as possible. These routine tasks include things such as 
compiling the code, deploying new lEAl versions, performing regression tests, and moving an 
lEAl version from test systems to a production environment. For such purposes. Continuous 
Integration ()Cip systems have been introduced and are now extended to Continuous Delivery 
dcni) or Continuous Deployment (jCDEp systems ( ]Humble and Farley[ |2010[ ) . The differentiation 
between ehedi and ICDEI is mostly done by the amount of tasks these systems automate. 
Whereas Cl systems often only compile and deploy a new lEAl version. ICDI systems also automate 
the testing tasks until an lEAl version that can be used as a release candidate. ICDEI describes an 
extension to ICDI that automatically deploys a release candidate to production. 

Even though the software engineering community in research and practice has already em¬ 
braced the changes by introducing approaches for lUIl ICDI and ICDEI a performance perspective 
for these new approaches is still missing. Specifically, the challenges of the two performance 


domains ISPEI and lAPMl are often considered independently from each other (Brunnert et al. 


2014a). In order to support the technical and organizational changes under the DevOps umbrella 


driven by the need to realize more frequent release cycles, this conventional thinking of looking 
at performance activities during IDevI (jSPEp and during |Ops| (lAPMp independently from each 
other needs to be changed. This report outlines existing technologies to support the ISPEI and 
lAPMI integration and outlines open challenges. 

3 Performance Management Activities 

Even though performance management activities have slightly different challenges during Dev 
and Ops there are a lot of commonalities in the basic methods used. These common methods 


are outlined in this section. Section 3.1 starts with the most fundamental performance manage¬ 


ment activity which is the measurement-based performance evaluation. As measurement-based 
performance evaluation methods always have the drawback of requiring a system to measure per¬ 
formance metrics, model-based performance evaluation methods have been developed in order 


^http://itrevolution.com/the-convergence-of-devops/ 


3 











































































Section 3. Performance Management Activities 


to overcome this requirement. Therefore, Section 3.2 focuses on performance modeling meth¬ 
ods. Finally, [Section 3. 3| outlines existing approaches to extract performance models and open 
challenges for performance model extraction techniques. 


3.1 Measurement-Based Performance Evaluation 

Measurement-based performance evaluation describes the activity of measuring and analyzing 
performance characteristics from an executing lEAl Measurement data can be obtained with 
event-driven and sampling-based techniques (Lilja, 2005 Menasce and Almeida, 2002). Event- 
driven techniques collect a measurement whenever a relevant event occurs in the system, e.g., 
invocation of a certain method. Sampling-based techniques collect a measurement at fixed time 
intervals, e.g., every second. The tools that collect the measurements are called monitors and 
are divided into hardware monitors (typically part of hardware devices, e.g., Core Processing 
Unit (jCPUjl . memory, and hard disk drive (jHDDh l and software monitors. 

Integrating software monitors into an application is called instrumentation. Instrumentation 
techniques can be categorized into direct code modification, indirect code modification using 


aspect-oriented programming, or compiler modification, or middleware interception (Jain, 1991 


Lilja 

2005 

Kiczales et al. 

1997 

Menasce and Almeida 

2002 


as static when the instrumentation is done at design or compile time, and as dynamic if the 
instrumentation is done at runtime without restarting the system. 

The instrumentation and the execution of monitors can alter the behavior of the system at 
runtime. Software monitors can change the control flow by executing code that is responsible for 
creating measurements. They also compete for shared resources like lCPlTl memory, and storage. 
The impact of the instrumentation on response times and resource utilization is often called 
measurement overhead. The degree of measurement overhead depends on the instrumentation 
granularity (e.g., a single method, all methods of an interface, or all methods of a component), 
the monitoring strategy (event-driven vs. sampling-based), instrumentation strategy (static vs. 
dynamic), and also the types and quality of the employed monitors. 

What information is of interest and where the information is to be obtained depends on the 
performance goals and the life cycle phase of an lEAl Diiring lDevl performance metrics are usually 
derived using performance, load, or stress tests on a test system, whereas measurements can be 
directly taken from a production system. The specifics of these activities are outlined in the 
respective sections later in this report. However, it is important to understand that performance 
measurements are highly dependent on the system and the workload used to collect them. 
Therefore, results measured on a one system are not directly applicable for another different 
system. This is also true for different workloads. Therefore, special care needs to be taken when 
selecting workloads and test systems for measurement-based performance evaluations during 
IDevl 

An overview on available commercial performance monitoring tools is given by Gartner in its 
annual published report titled “Gartner’s magic quadrant for application performance monitor¬ 
ing” (Kowall and Cappelli, 2014). The current market leaders are Dynatrace (Dynatrace 2015), 
App Dynamics (]AppDynamics 2015), NewRelic (New Relic, Inc., 2015), and Riverbed Technol¬ 
ogy (Riverbed Technology, 2015). Additionally, free and open source performance monitoring 


tools exist, e.g., Kieker (van Hoorn et al., 2012) 


Even though a lot of monitoring tools are available, there are still a lot of challenges to 
overcome when measuring software performance: 

• The configuration complexity of monitoring tools is often very high and requires a lot of 
expert knowledge. 

• Monitoring tools lack interoperability in particular with respect to data exchange and 
accessing raw data. 
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Section 3. Performance Management Activities 


• Selecting an appropriate monitoring tool requires a lot of knowledge of the particular 
features as some capabilities are completely missing from specific monitoring solutions. 

• Measurement-based performance evaluation during development requires a representative 
workload and usage profile (operational profile) to simulate users. In many cases, a repli¬ 
cation of the productive system is not available. 

• Setting up a representative test system for measurement-based performance evaluations 
outside of a production environment is often associated with too much effort and cost. 

• The accuracy of measurement results is highly platform-dependent, the exact same mea¬ 
surement approach on Linux can exhibit completely different results on Windows for ex¬ 
ample. 

• Selecting appropriate time frames to keep historical data during operations is quite chal¬ 
lenging. If the time period is too short it might happen that important data is lost too 
fast, if the period is too long a monitoring solution might run into performance problems 
itself due to the high amount of data it needs to manage. 


3.2 Model-Based Performance Evaluation 


Besides the measurement-based approach, performance behavior of a system can be evalu¬ 
ated using model-based based approaches. So-called performance models allow for representing 
performance-relevant aspects of software systems and serve as input for analytical solvers or 
simulation engines. Model-based approaches enable developers to predict performance metrics. 
This capability can be applied for various use cases within the life cycle of a software system, 
e.g., for capacity planning or ad-hoc analyses. The procedure is depicted in [Figure 3.1 


There are two forms of performance models available: analytical models and architecture- 
level performance models. Common analytical models include Petri nets. Queueing Networks 


m), Queueing Petri Nets ( |QPN^ ), or Layered Queueing Networks ( jLQN^ ) (jBalsamo et al. 


2004; Ardagna et ah, 2014). Architecture-level performance models depict key performance- 
influencing factors of a system’s architecture, resources, and usage (]Brosig et al. 2011). The 


lUMLI Profile for Schedulability, Performance and Time (jUML-SPTI) (Object Management Group. 


Inc. 


2005), the IUMD Profile for Modeling and Analysis of Real-Time and Embedded Systems 


([MARTEp (Object Management Group, Inc., 2011), the Palladio Gomponent Model (IPCMI) 


(Becker et ah, 2009), and the Descartes Modeling Language (IDMLp (Kounev et ah, 2014) are 


examples for architecture-level performance models. The latter two models focus on performance 
evaluation of component-based software systems and allow to evaluate the impact of different 
influencing factors on software components’ performance, which are categorized by [Koziolek 
(2010) as follows: 


• Component implementation: Several components can provide the same interface and func¬ 
tionality, but may differ in their response time or resource usage. 

• Required services: The response time of a service depends on the response time of its 
required services. 

• Deployment platform: Software components can be deployed on various deployment plat¬ 
forms, which consist of different software and hardware layers. 

• Usage profile: The execution time of a service can depend on the input parameter it was 
invoked with. 
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Figure 3.1: Model-based performance evaluation 


• Resource contention: The execution time of software components depends on the waiting 
time it takes to contend required resources. 

Architecture-level performance models can be either simulated directly or automatically trans¬ 
lated into analytical models and, then be processed by respective solvers. For instance, for IPCMl 
models different simulation engines and transformations to analytical performance models such 

Egm 


as 


or stochastic regular expressions exist. Analytical solvers and simulation engines have 
in common that they allow for predicting performance metrics. 

Performance models can be created automatically either based on running applications 


(Brunnert et ah, 2013b Brosig et al. 2009, 2011) or based on design specifications. Regarding 


the latter approach, they can be derived from a variety of different design specifications such as 
the Unified Modeling Language (lUMLD including sequence, activity, and collaboration diagrams 
(Petriu and Woodside, 2002, 2003[ Woodside et ah, 2005[ Brunnert et ah, 2013a), execution 
graphs ( jPetriu and Woodside 2002), use case maps (Petriu and Woodside, 2002), Specification 
and Description Language (jSDLp (Kerber, 2001), or object-oriented specifications of systems like 


class, interaction, or state transition diagrams based on object-modeling techniques (Cortellessa 


and Mirandola, 2000) 


Performance models and prediction of performance metrics provide the basis to analyze 
various use cases, especially, to support the DevOps approach. For instance, performance mod¬ 
els can be created for new systems during system development which intend to replace legacy 
systems. Their predicted metrics can then be compared with monitoring data of the existing 
systems from lITl operations and, e.g., allow for examining whether the new system is expected 
to require less resources. Alternatively, performance models of existing systems can be automat¬ 


ically derived from UTI operations, for example, using the approach by Brunnert et al. (2013b) 
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Section 3. Performance Management Activities 


for Java Enterprise Edition pava EEp applications. Subsequently, design alternatives can be 
evaluated regarding component specifications, software configurations, or system architectures. 
This enables architects and developers to optimize an existing system for different purposes like 
efficiency, performance, but also costs and reliability (Aleti et ah, 2013). System developers are 


also able to communicate performance metrics with IITI operations and ensure a certain level of 
system performance across the whole system life cycle. 

Eurthermore, model-based performance predictions can be applied to answer sizing ques¬ 
tions. System bottlenecks can be found in different places and, for instance, examined already 
during system development. The ability to vary the workload in a model also allows to evaluate 
worst-case scenarios such as the impact of an increased number of users on a system in case of 
promotional actions. In this way, a system’s scalability can be examined as well by specifying 
increased data volumes that have to be handled by components as it may be the case in the 
future. 

Regarding the DevOps approach to combine and integrate activities from software develop¬ 
ment and HU operations, model-based performance evaluation is, for instance, useful for a) ex¬ 
changing and comparing performance metrics during the whole system lifecycle, b) optimizing 
system design and deployment for a given production environment, and c) early performance 
estimation during system development. 


Selected challenges for model-based performance prediction include the following: 

• The representation of main memory as well as garbage collection is not explicitly integrated 
and considered in performance models, yet. 

• The selection of appropriate solution techniques requires a lot of expertise. 

3.3 Performance and Workload Model Extraction 

Performance models and workload models have to be created before we can deduce performance 
metrics. This section focuses on the extraction of architectural performance models, as they 
combine the capabilities of architectural models (e.g., UML) and analytical models (e.g., QN). 
Analytical models explicitly or implicitly assume resource demand of service execution per re¬ 
source. However, they do not provide a natural linking of resource demands with software 
elements (e.g., components, operations) like architecture-level performance models useful for 
DevOPs process automation. In traditional long-term design scenarios models may by extracted 
by hand. However, manual extraction is expensive, error-prone and slow compared to automatic 
solutions. Especially in contexts where Dev and Ops merge and the models frequently change, 
automation is of great importance. The main goal of performance model extraction for DevOps 
is to define and build an automated extraction process for architectural performance models. 
Basically, architectural performance models provide a common set of features which have to be 
extracted. We propose to structure the extraction into the following three extraction disciplines: 

1. System structure and behavior, 

2. Resource demand estimation, 

3. Workload Characterization. 


These extraction disciplines can be combined to a complete extraction process and are explained 
in the subsequent sections. Before, we will perform a dissociation of existing model extrac¬ 
tion approaches and outline general challenges. Some predictive models estimate service times 
without linking resource demands to resources. Approaches targeting their extraction of such 
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black-box models using, for example, genetic optimization techniques (e.g., Westermann et al. 


(2012) and Courtois and Woodside ( 2000[ )) are not considered in this report. These models serve 
as interpolation of the measurements. Neither a representation of the system architecture nor its 
performance-relevant factors and dependencies are extracted. Approaches to automatically con¬ 


struct analytical performance models, such as QN, have been proposed, e.g., by Menasce et al. 


(2005); Menasce et al. (2007) and Mos (2004). However, the extracted models are rather limited 
since they abstract the system at a very high level without taking into account its architecture 
and configuration. Moreover, for the extraction often imposes restrictive assumptions such as 
a single workload class or homogeneous servers. Others, like Kounev et al. (2011), assume the 


model structure to be fixed and preset (e.g., modeled by hand), and only derive model parame¬ 
ters using runtime monitoring data. Moreover, extraction software is often limited to a certain 
technology stack (e.g., Oracle WebLogic Server (Brosig et al. 2009)). We identify the following 


challenges and goals for future research on model extraction: 


• The assessment of validity and accuracy of extracted models is often based on a trial and 
error. An improvement would be to equip models with confidence intervals. 

• Model accuracy may expire if they are not updated on changes. Detection mechanisms are 
required to learn when models get out of date and when to update them. 

• Current performance modeling formalisms barely ensure the traceability between the run¬ 
ning system and model instances. With reference to DevOps, more traceability information 
should be stored within the models. 

• The automated inspection of the System Under Analysis often requires technology-specihc 
solutions. One solution, to enable less technology-dependent extraction tools, might be 
self-descriptive resources using standardized interfaces. 

• The extraction of performance capabilities is based on a combination of software and 
the (hardware) resources it is deployed at. This combined approach supports prediction 
accuracy but is less qualified regarding portability of insights to other platforms. One 
future research direction might be to extract separate models (e.g., separate middleware 
and application models). 

• Automated identification of an appropriate model granularity level. 

• Automated identification and extraction of parametric dependencies in call paths and 
resource demands. 


3.3.1 System Structure and Behavior 


The extraction of structure and behavior describes the configuration of the system. We sub¬ 
divide the extraction into the extraction of a) software components, b) resource landscape and 
deployment, and c) inter-component interactions. 

Software systems that are assembled by predefined components may be represented by the 


same components in a performance model Wu and Woodside (2004). Predehned components (by 


the developer) are for example: web services, EJBs in Java EE applications (e.g., in Brunnert 


et al. (2013b); Brosig et al. (2009 2011)), IComponent extensions in .NET or CORBA com¬ 
ponents. Those extraction techniques depend on predefined components. Existing approaches 
for software component extraction independent of predefined components target at source code 
refactoring in a classical development process. Examples for such reverse engineering tools and 


approaches are for example FOCUS (Ding and Medvidovic (2001)), ROMANTIC (Chardigny 


et al. (2008); Huchard et al. (2010)), Archimetrix (von Detten (2012)) or SoMoX ( ]Becker et al 
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Section 3. Performance Management Activities 


(2010); Krogman^(2010)). These approaches are either clustering-based, pattern-based, or com¬ 
bine both. They all identify components as they should be according to several software metrics. 
However, this does not necessarily correspond to the actual deployable structures, which is re¬ 
quired during operation. Consequently, the identihed components may be deployable in multiple 
parts. The reverse engineering approaches satisfy the ’Dev’ but not the ’Ops’ part of DevOps. 
An automated approach that works for DevOps independent of predefined component definitions 
is still an open issue. If no predehned components are provided, component definition requires 
manual effort. For manual extraction the following guidelines can be applied: i) classes that 
inherit from component interfaces (e.g., IComponent or EJBComponent) represent components, 
ii) all classes that inherit from a base class belong to the same component, iii) if component 
A uses component B then A is a composite component including B. Component extraction has 
the major challenges of technology dependency. Currently no tool that covers a wider range of 
component technologies is known to the authors. 

Besides software component identihcation one has to extract resource landscape and deploy¬ 
ment. Automated identihcation of hardware and software resources in a system environment 
is already available in industry. For instance, Hyperic (2014) or Zenoss (2014) provide such 


functionalities. Given a list of system elements, system, network and software properties can be 
extracted automatically. Further, low-level aspects, like cache topology, can be extracted using 
open source tools like LIKWID (Treibig et al. (2010)). The deployment, which is the mapping 
of software to resources, can be extracted using service event logs. These logs usually contain 
for an executed operation (besides the execution time) identihers that enable a mapping to the 
corresponding software component and the machine it was executed at. The extraction happens 
by the creation of one deployment component per couple of software component and resource 
identiher found within the event logs. The logging means no additional effort as the logs are 
also required for resource demand estimation. The extraction of a resource landscape in perfor¬ 
mance modeling is mostly performed semi-automatically. We ascribe this to mainly technical 
challenges, e.g., integration of information from different sources with many degrees of freedom 
(network, CPU count and clock frequency, memory, middleware, operating systems). 

The extraction of interactions between components differs for design time and runtime. At 
design time, models can be created using designer expertise and design documents (e.g., as 


performed in Smith and Williams 

(2002aD; Menasce and Gomaa ( 

2000 

); 

Petriu and Woodside 

(2002 

); 

Cortellessa and Mirandola 

(|2000D). Commencing at a runnable state, monitoring logs 


can be generated. Automated extraction of structural information based on monitoring logs has 
the advantage that it tracks the behavior of the actual product as it is evolved. An effective 
architecture can be extracted which means that only executed system elements are extracted 
(Israr et al. (2007)). Further, runtime monitoring data enables to extract branching probabilities 
for different call paths (Brosig (2014)). Selected approaches for control flow extraction are by 
Hrischuk et al.| (1999); [Briand et al.| ([2006) and Israr et al.| (2007). Hrischuk et al. (1999) and 


Briand et al. (2006) use monitoring information based on probes which are injected into the 


beginning of each response and propagated through the system. The approach of Israr et al. 


(2007) requires no probe information but is unable to model synchronization delays in parallel 
sub-paths which join together. In contemporary monitoring tools like DynaTrace, AppDynamics 
or Kieker the probe-based approach became standard. 


We identify the following major challenges for structure and behavior extraction: 

• Component extraction customization effort for case studies. Portability of technology 
dependent component extraction approaches is low. Current technology-independent com¬ 
ponent extraction approaches are considered not to be capable for a fully automated per¬ 
formance model extraction. 
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Section 3. Performance Management Activities 


Monitoring cnstomization effort for case studies. A complete extraction story requires 
a lot of tools with different interfaces to be connected. Especially, the portability and 
combination of multiple resource extraction approaches is a complex task. 


3.3.2 Resource Demand Estimation 

In architecture-level performance models, resource demands are a key parameter to enable their 
quantitative analysis. A resource demand describes the amount of a hardware resource needed 
to process one unit of work (e.g., user request, system operations, or internal actions). The gran¬ 
ularity of resource demands depends on the abstraction level of the control flow in a performance 
model. Resource demands may depend on the value of input parameters. This dependency can 
be either captured by specifying the stochastic distributions of resource demands or by explicitly 
modeling parametric dependencies. 

The estimation of resource demands is challenging as it requires a deep integration between 
application performance monitoring solutions and operating system resource usage monitors 
in order to obtain resource demand values. Operating system monitors often only provide 
aggregate resource usage statistics on a per-process level. However, many applications (e.g., web 
and application servers) serve different types of requests with one or more processes. 


Profiling tools (Graham et ah, 1982; Hall, 1992) are typically used during development to 


track down performance issues as well as to provide information on call paths and execution 
times of individual functions. These profiling tools rely on either fine-grained code instrumenta¬ 
tion or statistical sampling. However, these tools typically incur high measurement overheads, 
severely limiting their usage during production, and leading to inaccurate or biased results. In 


order to avoid distorted measurements due to overheads, Kuperberg et al. (2008, 2009) propose a 


two-step approach. In the first step, dynamic program analysis is used to determine the number 
and types of bytecode instructions executed by a function. In a second step, the individual byte¬ 
code instructions are benchmarked to determine their computational overhead. However, this 
approach is not applicable during operations and fails to capture interactions between individual 


bytecode instructions. lAPMI tools, such as Dynatrace (2015) or AppDynamics (2015), enable 


fine-grained monitoring of the control flow of an application, including timings of individual 
operations. These tools are optimized to be also applicable to production systems. 

Modern operating systems provide facilities to track the consumed ICPUl time of individual 
threads. This information is, for example, also exposed by the Java runtime environment. This 
information can be exploited to measure the ICPUl resource consumption of processing individual 


requests as demonstrated for Java by Brunnert et al. (2013b) and at the operating system level 


by Barham et al. (2004). This requires application instrumentation to track which threads are 


involved in the processing of a request. This can be difficult in heterogeneous environments using 
different middleware systems, database systems, and application frameworks. The accuracy of 
such an approach heavily depends on the accuracy of the [CPU] time accounting by the operating 
system and the extent to which request processing can be captured through instrumentation. 

Over the years, a number of approaches to estimate the resource demands using statistical 
methods have been proposed. These approaches are typically based on a combination of ag¬ 
gregate resource usage statistics (e.g., jUPUj utilization) and coarse-grained application statistics 
(e.g., end-to-end application response times or throughput). These approaches do not depend 
on a fine-grained instrnmentation of the application and are therefore widely applicable to dif¬ 
ferent types of systems and applications incurring only insignificant overheads. Different ap¬ 
proaches from queuing theory and statistical methods have been proposed, e.g., response time 


approximation (Brosig et ah, 2009 

Urgaonkar et ah 

2007D, least-sqnares regression (Bard and 

Shatzoff 19781 Rolia and Vetland 

1995; Pacifici et ah 

|2008p , robust regression techniques 

(Cre- 

monesi and Casale, 2007; Casale et ah, 2008), cluster-wise regression 

(Cremonesi et ah, ^ 

>010), 

Kalman Filter (Zheng et ah, 2008, Kumar et ah, 2009a Wang et ah 

2012), optimization tech- 
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Section 3. Performance Management Activities 


niques (Zhang et al., 2002,|Liu et al. 2006 Menasce, 2008[ Kumar et al., 2009b), Support Vector 


Machines (Kalbasi et al., 2011), Independent Component Analysis (Sharma et al., 2008), Maxi¬ 


mum Likelihood Estimation (Kraft et al. 


2009 


Gibbs Sampling (Sutton and Jordan 2011; Perez et ah, 2013). These approaches differ in their 


Wang and Casale, 2013; Perez et ah, 2015), and 


required input measurement data, their underlying modeling assumptions, their output metrics, 
their robustness to anomalies in the input data, and their computational overhead. A detailed 


analysis and comparison is provided by Spinner et ah (2015). A Library for Resource Demand 


Estimation (ILibReDEjl offering ready-to-use implementations of several estimation approaches 


is described in Spinner et ah (2014) 


We identify the following areas of future research on resource demand estimation: 

• Current work is mainly focused on ICPUl resources. More work is required to address the 


specifics of other resource types, such as memory, network, or Input / Output (I/O) devices. 


The challenges with these resource types are, among others, that the utilization metric is 
often not as clearly defined as for ICPUL . and the resource access may be asynchronous. 

Comparisons between statistical estimation techniques and measurement approaches are 
missing. This would help to better understand their implications on accuracy and overhead. 

Most approaches are focused on estimating the mean resource demand. However, in order 
to obtain reliable performance predictions it is also important to determine the correct 
distribution of the resource demands. 

Modern system features (e.g., multi-core ICPUk dynamic frequency scaling, virtualization) 
can have a significant impact on the resource demand estimation. 

Resource demand estimation techniques often require measurements for all requests during 
a certain time period in which a resource utilization is measured, whereas resource demand 
measurements can be applied for a selected set of transactions. 


3.3.3 Workload Characterization and Workload Model Extraction 

Workload characterization is a performance engineering activity that serves to a) study the way 
users (including other systems) interact with the System Under Analysis (ISUAI) via the system- 
provided interfaces and to b) create a workload model that provides an abstract representation 
of the usage profile pain[ 1991[). 


Menasce and Almeida (2002) suggest to decompose the system’s global workload into work¬ 


load components (e.g., distinguishing web-based interactions from client/server transactions), 
which are further divided into basic components. Basic components (e.g., representing business- 
to-business transaction types or services invoked by interactive user interactions via a web-based 
EID are assigned workload intensity (e.g., arrival rates, number of user sessions over time, and 
think times) and service demand (e.g., average number of documents retrieved per service re¬ 
quest) parameters. 

The remainder of this section focuses on the extraction of workload characteristics related 


to navigational profiles (Section 3.3.3.1) and workload intensity (Section 3.3.3.2). 


3.3.3.1 Navigational Profiles For certain kinds of systems, the assumption of workload 
being an arrival of independent requests is inappropriate. A common type of enterprise applica¬ 
tions are session-based systems. In these systems, users interact with the system in a sequence 
of inter-related requests, each being submitted after an elapsed think time. The notion of a 
navigational profile is used to refer to the session-internal behavior of users. The navigational 
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browse 


manage 


purchase 



Figure 3.2: SPECjEnterprise2010 transaction types {browse, manage, and purchase) in a lCBMCH 
(Markov chain) representation (van Hoorn et ah, 2014). For examples, in browse transactions, 
users start with a login, followed by a view item request in 100% of the cases; view items is 
followed by view items with a probability of 93% and by home in 7% of the cases. 


profile captures the possible ways or states of a workload unit (single user/customer) through 
the system. Note that we do not limit the scope of the targeted systems to web-based soft¬ 
ware systems but to multi-user enterprise application systems in general. The same holds for 
the notion of a request, which is not limited to web-based software systems. One goal of the 
session-based notion is to group types of users with a similar behavior. 

Metrics and Characteristics. For session-based systems, workloads characteristics can 
be divided into intra-session and inter-session characteristics. Intra-session characteristics in¬ 
clude think times between requests and the session length, e.g., in terms of the time elapsed 
between the first and the last request within a session and the number of requests within a 
session. Inter-session characteristics include the number of sessions per user and the number 
of active sessions over time as a workload intensity metric. Moreover, request-based workload 
characteristics apply, e.g., the distribution of invoked request types observed from the server 
perspective. 

Specification and Execntion of Session-Based Workloads. Two different approaches 
exist to specify session-based workloads, namely based on a) scripts and b) on performance 
models. 

Script-based specification is supported by essentially every load testing tool. The workflow of 
a single user (class) is defined in a programming-language style—sometimes even using program¬ 
ming languages such as Java or C-|—|- (e.g., HP Loadrunner). These scripts, representing a single 
user, are then executed by a defined number of concurrent load generator threads. Even though 
the scripting languages provide basic support for probabilistic paths, the scripts are usually very 
deterministic. 

As opposed to this, performance models provide an abstract representation of a user session— 
usually including probabilistic concepts. An often-used formalism for representing naviga- 
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tional profiles in session-based systems are Markov chains (Menasce et ah, 1999; van Hoorn 


et al. 


et al. 

( 


2008 

Li and Tian 

(1999 

) introduce a 


2003), i.e., probabilistic finite state machines. For example, Menasce 


(1999) introduce a formalism based on Markov chains, called Customer Behavior Model 
Graphs (jCBMGt j). In a ICBMGl states represent possible interactions with the lSUAl Transitions 
between states have associated transition probabilities and average think times. For example. 


Figure 3.2 depicts ICBMCb for transactions types of a (modihed) workload used by the industry- 


standard benchmark SPECjEnterprise2010 (van Hoorn et ahj |2014). ICBMCb can be used for 


workload generation (Menasce, 2002). As emphasized by Krishnamurthy et al. (2006) and Shams 


et al. (2006), limitations apply when using [UBMGb for workload generation. For example, the 
simulation of the Markov chain may lead to violations of inter-request dependencies, i.e., to 
sequences of requests that do not respect the protocol of the lSUAl Two items, for instance, may 
be removed from a shopping cart, even though only a single item has been added before. To sup¬ 
port inter-request dependencies (and data dependencies). Shams et al. (2006) propose a workload 
modeling approach based on (non-deterministic) Extended Einite State Machines (|EESMb b An 
lEESMI describes valid sequences of requests within a session. As opposed to ICBMGh . transitions 
are not labeled with probabilities but with predicates and actions based on pre-defined state 
variables. The actual workload model is the combination of valid sessions obtained by simulat¬ 
ing an lEESMl along with additional workload characteristics like session inter-arrival times, think 
times, session lengths, and a workload mix modeling the relative frequency of request types. Van 
Hoorn et al. (2008 2014) combine the aforementioned modeling approaches based on ICBMGb 
and lEFSMF Other approaches based on analytical models employ variants of Markov chains 


Barber 2004b), Probabilistic Timed Automata (Abbors et al., 2013) and IUMD State Machines 


Becker et al. 2009 Object Management Group, Inc., 2005, 2013) 


Extraction. Extractions of navigational profiles are usually based on request logs obtained 
from a lSUAl Eor web-based systems, these logs (also referred to as access logs or session logs) 
usually include for each request the identifier of the requested service, a session identifier (if 
available), and timing information (time of request and duration). Data mining techniques. 


such as clustering, are used to extract the aforementioned formal models (Menasce et ah, 1999 


van Hoorn et al., 2014). 


We identify the following challenges and future directions for navigational profiles as part of 
workload characterization: 


• Most methods so far have focused on extracting and characterizing navigational profiles 
offline. Promising future work is to perform such extraction and characterization continu¬ 
ously, e.g., to use the gathered information for near-future predictions which do not only 
take the workload intensity into account. 

• Navigational profiles, or workload scripts in general, outdate very fast due to changes as 
part of the evolving ISUAl This concerns the expected usage pattern for the application 
but also protocol-level details in the interaction (service identifiers, parameters, etc.). A 
future direction could be to utilize navigational profiles already during development. 


3.3.3.2 Load Intensity Profiles A load intensity profile definition is a crucial element to 
complete a workload characterization. The observed or estimated arrival process of transac¬ 
tions (on the level of users, sessions or requests/jobs arrivals) needs to be specified. As basis 
to specify time-dependent arrival rates or inter-arrival times, the extraction of a usage model 


see 


Section 3.3.3.1) should provide a classification of transaction types that are statistically 


indistinguishable in terms of their resource demanding characteristics. A load intensity profile 
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Section 4. Software Performance Engineering During Development 


is an instance of an arrival process. A workload that consists of several types of transactions is 
then characterized by a set of load intensity profile instances. 

Load intensity profiles are directly applicable in the context of any open workload scenario 


with a theoretically unlimited number of users, but are not limited to those (Schroeder et al. 


2006). In a closed or partially closed workload scenario, with a limited number of active trans¬ 


actions, the arrival process can be specified within the given upper limits and zero. Any load 
intensity profile can be transformed into a time series containing arrival rates per sampling 
interval. 

A requirement for a load profile to appear as realistic (and not synthetic) for a given ap¬ 
plication domain is a mixture of a) one or more (overlaying) seasonal patterns, b) long term 
trends including trend breaks, c) characteristic bursts, and d) a certain degree of noise. These 


components can be combined additive or multiplicative over time as visualized in Figure 3.3 



Seasonal 


Trends & 
Breaks 


Overlaying 

Seasonal 


Burst 


Figure 3.3: 


Elements of load intensity profiles (von Kistowski et al., 2014) 


At early development stages, load intensity profiles can be estimated by domain experts 
by defining synthetic profiles using statistical distributions or mathematical functions. At a 
higher abstraction level, the Descartes Load Intensity Model (|DLIM|) allows to descriptively 


define the seasonal, trends, burst, and noise elements in a wizard-like manner (von Kistowski 


et al., 2014). IDLIMl is supported by a tool-chain named Load Intensity Modeling Tool (ILIMBOp 


(Descartes Research Group, 2015). A good starting point for a load intensity profile definition 


at the development stage is to analyze the load intensity of comparable systems within the same 
domain. If traces from comparable systems are available, a load profile model can be extracted 
in a semi-automated manner as described by von Kistowski et al. (2015). 


We identify the following open challenges in the field of load profile description and their 
automatic extraction: 

• Seasonal patterns may overlay (e.g., weekly and daily patterns) and change in their shape 
over time. The current extraction approaches do not fully support these scenarios. 


4 Software Performance Engineering During Development 

This section focuses on how a combination of model-based and measurement-based techniques 
can support performance evaluations during software development. First, we will focus on the 
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challenges of how to conduct meaningful performance analyses in stages where no implementation 
exists (Section 4.1). During development, timely feedback and guidance on performance-relevant 
properties of implementation decisions is extremely valuable to developers, e.g., concerning the 
selection of algorithms or data structures. Support for this is provided under the umbrella of 


performance awareness, presented in [Section 4.2 Next, in Section 4.3, we present approaches 


to detect performance problems automatically based on analyzing design models and runtime 
observations. Finally, [Section 4.4| focuses on the automatic detection of performance regressions, 
including approaches that combine measurements and model-based performance prediction. 


4.1 Design-Time Performance Models 

In early software development phases like the design phase a lot of architectural, design and 
technology decisions must be made that can have a significant influence on the performance 
during operations ( [Koziolek 2010|). However, predicting the influence of design decisions on 
the performance is difficult in early stages. Many questions arise for software developers and 
architects during these phases. These questions include but are not limited to; 


• What influence does a specific design decision have on performance? 

• How scalable is the designed software architecture? 

• Given the performance of reused components/systems, can the performance goals be 
achieved? 


In the early software development phases, performance models can be used as 
instrument to answer these questions (Woodside et ah, 2007|. 


■‘early warning” 
The goal of using performance 
models is to support performance-relevant architecture design decisions. Using performance 
models in early development phases should also motivate software developers to engage in early 
performance discussions like answering what-if questions (Thereska et ah, 2010). A what-if 


question describes a specific case for design or architectural decision like: ’’What happens if we 
use client-side rendering?” as an alternative to ’’What happens if we use server-side rendering?”. 

In order to create performance models, information about the system’s architecture (i.e., sce¬ 
narios describing system behavior and deployment), workloads (see Section 3.3.3), and resource 
demands (see Section 3.3.2) is needed. However, in early phases of the software development it 
is challenging to create accurate performance models as the software system is not yet in pro¬ 
duction and it is difficult to collect and identify all required empirical information. Especially, 
it is difficult to get data about the workload and the resource demands of the application. In 
order to face these challenges. Barber (2004a) proposes activities to gather that kind of data: 


• First of all, available production data from existing versions of the software or from external 
services required for the new system should be analyzed. Using this data the workload, 
usage scenarios, and resource demands can be extracted. 

• If no production data is available, design and requirements documents can be analyzed 
regarding performance expectations for new features. Especially Service-level Agreements 
(ISLAb l can be used as approximation for the performance of external services until mea¬ 
surements are available. 

• The resource demands can be estimated (lUFUl 
software developers. 

• After the identification of the available information about the design, workload and re¬ 
source demands one or more drafts of the performance model should be created and first 


I/O, etc.) based on the judgments from 
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simulation results should be derived. The results can then be presented and discussed 
with developers and architects. Then the models should be continuously improved based 
on further experiments, feedbacks, results, and common sense. 


The challenges of using performance models in early development phases is that it is often difficult 
to validate the accuracy of the models until a running system exists. Performance predictions 
based on assumptions, interviews, and pretests can also be inaccurate and subsequently also the 
decisions based on these predictions. However, a model helps to capture all the collected data in 
a structured form (Brunnert et al., 2013a). Considering all these aspects, the following challenges 
exist for performance evaluations in early development phases that need to be addressed: 


• The trust of the architects and developers in these models can be very low. It is therefore 
important to make the modeling assumptions and data sources for these models transparent 
to their users. 

• Design models may be incomplete or inconsistent, especially in the early stages of software 
development. 

• Modern agile software development processes have the goal to start with the implementa¬ 
tion as early as possible and thus may skip the design step. For such methodologies, the 
approaches outlined in the following section might be better applicable. 


4.2 Performance Awareness 


Due to time constraints during software development, non-functional aspects with respect to 
software quality—e.g., performance—are often neglected. Performance testing requires realistic 
environments, access to test data, and implies the application of specihc tools. Continuously 
evaluating the performance of software artifacts also decreases the productivity of developers. 
For quality aspects such as code cleanness or bug pattern detection, a number of automatic tools 
supporting developers exist. Well-known examples are Checkstyl^ and FindBug^ Tools that 
focus on performance aspects are not yet widely spread, but would be very useful in providing 
awareness on the performance of software to developers. 

Performance awareness describes the availability of insights on the performance of software 


systems and the ability to act upon them (Tuma 
awareness into four different dimensions: 


2014). Tuma divides the term performance 


1. The awareness of performance-relevant mechanisms, such as compiler optimizations, sup¬ 
ports understanding the factors influencing performance. 

2. The awareness of performance expectations aims at providing insights on how well software 
is expected to perform. 

3. Performance awareness also intends to support developers with insights on the performance 
of software they are currently developing. 

4. Performance-aware applications are intended to dynamically adapt to changing conditions. 

The most relevant perspectives for DevOps are the performance awareness of developers and 
the awareness of performance expectations. Gaining insights into the performance of the code 
they are currently developing is an increasingly difficult task for developers. Large application 
system architectures, a continuous iteration between system life cycle phases, and complex IITI 

^http: //checkstyle.sourceforge.net 

^http://findbugs.sourceforge.net 
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governance represent great challenges in this regard. Complex system of systems architectures 
often imply a geographical, cultural, organizational, and technical variety. The structure, re¬ 
lationships, and deployment of software components are often not transparent to developers. 
Due to continuous iterations between development and operations, the performance behavior of 
components is subject to constant change. The responsibility for components is also distributed 
across different organizational units, increasing the difficulty to access monitoring data. Addi¬ 
tionally, developers require knowledge in performance engineering and in using corresponding 
tools. A number of approaches propose automated and integrated means to overcome these 
challenges and supporting developers with performance awareness. Selected existing approaches 

pMs 


either provide performance measurements (Heger et ah, 2013 Horky et al 


2014) or performance predictions (Weiss et al. 


2013 


Bures et al. 


Danciu et al., 2014) to developers. A brief 


overview of these approaches is provided below. 

Measurement-based approaches collect performance data during unit tests or during runtime. 


Heger et al. (2013) propose an approach based on measurements collected during the execution 


of unit tests that integrates performance regression root cause analysis into the development 
environment. When regressions are detected, the approach supports the developer with infor¬ 
mation on the change and the methods causing the regression. The performance evolution of 
the affected method is presented graphically as a function and the methods causing the regres¬ 


sion are displayed. Horky et al. (2015) suggest enhancements to the documentation of software 


libraries with information on their performance. The performance of libraries is measured using 
unit tests. Tests are executed on demand once the developer looks up a specific method for 
the first time. Tests can be executed locally or on remote machines. Measurements are then 
cached and refined iteratively. The approach proposed by Bures et al. (2014) integrates per¬ 


formance evaluation and awareness methods into different phases of the development process 
of autonomic component ensembles. High-level performance goals are formulated during the 
requirement phase. As soon as software artifacts become deployable, the actual performance is 
measured. Developers receive feedback on the runtime performance within the Integrated De¬ 
velopment Environment (jlDEjl . Measurements are represented graphically as functions within a 
pop-up window. At runtime it may be unclear whether the observed behavior also reflects the 
expected one. Approaches for supporting the awareness of performance expectations provide 


a means to formulate, communicate, and evaluate these expectations. Bulej et al. (2012) pro¬ 


pose the usage of the Stochastic Performance Logic (|SPLD to express performance assumptions 
for specific methods in a hardware-independent manner. Assumptions on the performance of a 
method are formulated relative to another method and are not specified in time units. At run¬ 
time, assumptions are evaluated and potential violations can be reported to the developer. [Bures 
et al. employ ISPLI during design to capture performance goals and assign them to individual 
methods. These assumptions are then tested during runtime. 

Model-based prediction approaches aim at supporting developers with insights on the per¬ 
formance of software before it is deployed. The approach by Weiss et al. (2013) evaluates the 


performance of persistence services based on tailored benchmarks during the implementation 
phase. The approach enables developers to track the performance impact of changes or to com¬ 
pare different design alternatives. Results are displayed within the lIDEl as numerical values and 
graphically as bar charts. The approach is only applicable for Java Persistence API (IJPAp ser¬ 
vices, but instructions on how to design and apply benchmark applications to other components 
are also provided by the authors. The approach proposed by Danciu et al. (2014) focuses on 


the I Java EEI development environment. The approach supports developers with insights on the 
expected response time of component operations they are currently implementing. Estimations 
are performed based on the component implementation and the behavior of required services. 


We identify the following open challenges and future research directions in the field of per- 
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formance awareness: 

• Current approaches mainly focus on providing insights on performance but omit enabling 
developers to act upon them. Future research should investigate how guidance for correct¬ 
ing problems could be provided. 

• Insights compiled for the specihc circumstances of developers should be collected and 
exchanged with Ops. Thus, new trends can be identihed and corresponding measures can 
be taken. 

• The acceptance of performance awareness approaches by developers needs to be evaluated 
more extensive and improved. Increasing the acceptance will foster the diffusion of these 
approaches into industry. 

• The performance improvements which can be achieved by employing performance aware¬ 
ness approaches need to be evaluated using industry scenarios. 


4.3 Performance Anti-Pattern Detection 


Software design patterns (Gamma et al., 1994) provide established template-like solutions to 
common design problems to be used consciously during development. By contrast, software 
performance anti-patterns (Smith and Williams, 2000) constitute design, development, or de¬ 
ployment mistakes with a potential impact on the software’s performance. Hence, anti-patterns 
are primarily used as a feedback mechanism for different stakeholders of the software engineer¬ 
ing process (e.g., software architects, developers, system operators, etc.). Hereby, approaches for 
the detection of performance anti-patterns constitute the basis for anti-pattern-based feedback. 
There are different approaches for performance anti-pattern detection utilizing different detec¬ 
tion methodologies, requiring different types of artifacts and being applicable in different phases 
of the software engineering process. Some of these approaches can be integrated into lIDEb and 
in this way support performance awareness (see 
occurring performance mistakes. 


Section 4.2) by providing direct feedback on 


4.3.1 The Essence of Performance Anti-Patterns 


There is a large body of scientific and industrial literature describing different performance anti¬ 


patterns ( 

Smith and Williams 

r 

!000 

2002b|c, 

2003 

Dudney et al 

2003 Dugan et al. 

2002 

Boroday et aH 

2005 

Tene 

2015 

Reitbauer 

2010 

Grabner 

2010 

<opp, 

2011 

Still, 

2013 

). All 


definitions of performance anti-patterns have in common that they describe circumstances that 
may lead to performance problems under certain load situations. However, the dehnitions of per¬ 
formance anti-patterns conceptually differ with respect to different dimensions. While some anti¬ 


patterns describe mistakes on the architecture level (e.g.. Blob anti-pattern (Smith and Williams 


2000D), others refer to problems on the implementation level (e.g.. Spin Wait anti-pattern 


day et ah, 2005)) or even deployment-related problems (e.g.. Unbalanced Processing 



and Williams, 2002b)). Eurthermore, definitions of anti-patterns differ in the level of abstrac¬ 


tion. While some anti-patterns describe high-level symptoms of performance problems (e.g., 


The Ramp anti-pattern (Smith and Williams, 2002b)), other anti-patterns describe application- 
internal indicators or even root causes (e.g., Sisyphus Database Retrieval ( ]Dugan et al. 2002)) 
Eurthermore, anti-patterns may describe structural (e.g.. Blob anti-pattern ([Smith and Williams 


2000)) or behavioral patterns (Empty Semi Trucks anti-pattern (Smith and Williams, 2003|)). 


Depending on the types of anti-patterns, different detection approaches are more or less suitable 
for their detection. 


18 








































































































Section 4. Software Performance Engineering During Development 


4.3.2 Detection Approaches 

Approaches for the detection of performance anti-patterns can be divided into model-based 
approaches and measurement-based approaches. Their categories of approaches imply different 
circumstances under which they can be applied, yielding different limitations and benefits. In 
the following, we briefly discuss the two categories of detection approaches. 


Model-based Approaches Model-based approaches for the detection of performance anti¬ 


patterns (Trubiani and Koziolek, 2011 

Cortellessa and Frittella 

2007; Xu 

2012 

Cortellessa 

et al. 

, 2010 

) require architectural (e.g.. 

r*CMI or IMARTEp or analvtic le.Ef.. 

jQN) performance 


predicate logic (Trubiani and Koziolek, |2011 )) allows to capture structural as well as behavioral 


aspects of performance anti-patterns. While certain structural and behavioral aspects can be 
evaluated directly on the architectural model, associated performance-relevant runtime aspects 
can be derived by performance model analysis or simulation. Applying anti-pattern detection 
rules to the models allows to identify flaws in the architectural design that may impair soft¬ 
ware performance. Due to the abstraction level of architectural models, the detection scope of 
model-based approaches is inherently limited to architecture-level anti-patterns. In particular, 
performance anti-patterns that are manifested in the details of implementation cannot be de¬ 
tected by model-based approaches. Furthermore, due to the high dependency on the models, 
the detection accuracy of model-based anti-pattern detection approaches is tightly coupled to 
the quality (i.e., accuracy and representativeness) of the architectural models. 


Measurement-Based Approaches Depending on their stage of usage in the software life- 
cycle, measurement-based approaches can be further divided into test-based and operation-time 
anti-pattern detection approaches: 

Test-based anti-pattern detection approaches utilize performance tests (e.g., as part of in¬ 


tegration testing (Jorgensen and Erickson, 1994)) to gather performance measurement data as 


basis for further reasoning on existing performance problems (Wert et ah, 2013 2014; Grechanik 


et al. 2012). Thereby, measurement-based approaches (see Section 3.1) are applied to retrieve 


performance data of interest. As monitoring tools introduce measurement overhead that may 


impair the accuracy of measurement data, systematic experimentation (Westermann, 2014; Wert 


et al., 2013) can be applied to deal with the trade-off between accurate measurement data and 


high-level of detail of the data. Similarly to the model-based approaches, test-based detection 
approaches apply analysis rules that evaluate the measurement data to identify potential perfor¬ 
mance anti-patterns. As test-based detection approaches rely on execution of the system under 
test, a testing environment is required that is representative to the actual production environ¬ 
ment. In order to save costs, the testing environment is often considerably smaller than the 
production environment. As detection of performance anti-patterns is often relative to the per¬ 
formance requirements, in these cases, performance requirements need to be scaled down to the 
size of the testing environment in order to allow reasonable, test-based detection of performance 
anti-patterns. Furthermore, test-based detection approaches utilize load scripts for load genera¬ 
tion during execution of performance tests. Hence, the detection accuracy highly depends on the 
quality (i.e., representativeness of real users) of the load scripts. The DevOps paradigm is the key 
enabler to derive representative load scripts for test-based anti-pattern detection from produc¬ 
tion workloads (Section 3.3.3). Test-based approaches analyze the implemented target system 


in its full level of detail. They potentially cover all types of anti-patterns: from architecture- 
level via implementation-level through to structural as well as behavioral anti-patterns. Finally, 


test-based approaches that run fully automatic (Wert et ah, 2013, 2014) can be assimilated into 


EJ (Duvall et ah, 2007) in order to provide frequent, regular feedback on potential existence of 


performance anti-patterns in the code of the target application. 
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Section 4. Software Performance Engineering During Development 


Operation-time anti-pattern detection approaches, e.g., by Parsons and Murphy (2004), are 
similar to test-based approaches with respect to the detection methodology. However, as they 
are applied on production system environments, they entail additional limitations as well as 
benefits. In a production environment, the measurement overhead induced by monitoring tools 
is a much more critical factor than with test-based approaches, as the monitoring overhead 
must not noticeably affect the performance of real user requests. Therefore, operation-time anti¬ 
pattern detection approaches apply rather coarse-grained monitoring, which affects the ability 
of providing detailed insights on specific root causes of performance problems. Furthermore, 
performance anti-patterns that are detected by operation-time approaches might already have 
resulted in a performance problem experienced by end users. Hence, operation-time detection 
of performance anti-patterns is rather reactive. However, as performance characteristics are 
investigated on the real system, under real load, operation-time approaches are potentially more 
accurate than model-based or test-based approaches. 


We see the following research challenges in the area of performance anti-pattern detection: 

• Anti-patterns are usually described in textual format. More work is needed to formalize 
these descriptions into machine-processable rules and algorithms. 

• Many approaches use fixed thresholds in their detection rules and algorithms, e.g., in order 
to judge whether a number of remote communications is indicative for a performance 
problem. This leads to context-specific and system-specific configurations. More research 
is needed to automatically determine suitable thresholds or to completely avoid them. 

• More research is also needed to better understand and formalize the relationship between 
symptoms, indicators, and root-causes connected to performance anti-patterns and perfor¬ 
mance problems in general. 

4.4 Performance Change Detection 

A key goal of a tighter integration between development and operations teams is to better cope 
with changes in the business environment. In order to react quickly in such situations, a high 
release frequency is necessary. This changes the typical software release process in a way that, 
instead of releasing new features or bug fixes in larger batches in a few major versions, they are 
released more frequently in many minor releases. 

The performance characteristics of lEAb can change whenever new features or bug fixes are 
introduced in new versions. Due to this reason, it is necessary to continuously evaluate the 
performance of lEAl versions to detect performance changes before an lEAl version is moved to 
production. The previously introduced approaches during design and development cannot cap¬ 
ture all performance-related changes. Activities during the design phase might not capture such 
changes because minor bug fixes or feature additions do not change the design and are thus not 
detectable. During the implementation phase, only changes are detectable that are caused by 
the code of an lEAl directly. Therefore, changes that are only detectable on different hardware 
environments (e.g., in different deployment topologies) or in specific workload scenarios must be 
analyzed before an lEAl version is released. 

In order to analyze such performance-related changes of lEAl deployments on different hard¬ 
ware environments or for multiple workload scenarios, measurement- and model-based perfor¬ 
mance evaluation techniques can be used. Measuring the performance of each lEAl version is 
often not feasible because maintaining appropriate test environments for all possible hardware 
environments and workloads is associated with a lot of cost and effort. Therefore, a mixture 
of model- and measurement-based performance evaluation approaches to realize performance 
change detection techniques are introduced in the following. 
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Detecting performance change in a deployment pipeline (Brunnert and Krcmar 


A measurement-based technique that can be used to detect performance changes on a low 
level of detail are performance unit tests (Horky et al., 2015). The key idea of such performance 
unit tests is to ensure that regressions introduced by developer check-ins are quickly discovered. 
One possible implementation of performance unit testing is to use existing lAPMI solutions during 
the functional unit tests. In order to make the developers aware of any performance regressions 
introduced by their changes in new IE Al versions, an integration of such tests in [CT] systems is often 


proposed (Waller et ah, 2015). A measurement-based approach that requires test environments 
that are comparable to the final production systems is proposed by Bulej et al. (2005). The 


authors propose to use application-specific benchmarks to test the performance for each release. 
Using such benchmarks has the advantage of repeatability and also makes the results more 
robust compared to single performance tests. Another measurement-based approach to detect 
performance regressions during development is proposed by Nguyen et al. (2012). The authors 
propose to use so-called control charts in order to detect performance changes between two 
performance tests. Similar to this approach are the works by Cherkasova et al. (2008, 2009[) and 


Mi et al. (2008). The authors propose the use of so-called application signatures. Application 


signatures describe the response time of specific transactions relative to the resource utilization. 
However, application signatures are intended to find performance changes for systems that are 
already in production. 

A model-based performance change detection process within a deployment pipeline, depicted 

This approach uses monitoring 


m 


Eigure 4.1 

is proposed by 

Brunnert and Krcmar 

(2014 


data collected during automated acceptance tests in order to create models (called resource 
profiles). These resource profiles describe the resource demand for each transaction of an lEAl 
and are managed in a versioning repository in order to be able to access the resource profiles of 
previous builds. The resource profile of the current [E Al version is used to predict performance for 
predefined hardware environments and workloads. The prediction is performed with predictions 
derived from resource profiles of one or several previous versions. The prediction results are 
compared with each other used as an indicator for change. The resource profiles themselves and 
the check-ins that triggered a build are then analyzed in order to find the source of a change 
(e.g., by comparing two resource profiles). 

Compared to several works that introduce approaches to detect performance change, there 
is a relatively low amount of work on identifying the reasons for a performance change. One of 
the few examples is the work by Sambasivan et al. (2011). The authors propose an approach 
to analyze the reasons of a change based on request execution flow traces. Their approach is 
based on the response time behavior of single component operations involved in the request 
processing and their control flow. It might be interesting to combine this approach with the 
approach presented by Brunnert and Krcmar (2014) because the resource profiles and their 
prediction results provide all the necessary data for the root cause search approach presented 
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Section 5. Application Performance Management During Operations 


by Sambasivan et al. (2011). This would allow to detect and analyze performance change in a 
completely model-driven way. 

Even though there is a lot of work going on in the area of performance change detection, a 
lot of open challenges remain, such as: 


• Test coverage is always limited: performance changes in areas that are not covered by a 
test workload cannot be detected. 

• Model-based performance change detection techniques are always associated with the risk 
that aspects with impact on performance are not properly reflected in a model. 

• A lot of the existing model-based performance evaluation techniques focus on ICPUl demand. 
if memory, network, or IHDDi would cause a performance regression, it cannot be detected 
using such models. 

• Test coverage is not only limited for software itself but also in terms of the amount of 
workloads and hardware environments that can be tested in a reasonable time frame. 


5 Application Performance Management During Operations 


Once an lEAl is running in a production environment it is important to continuously ensure that 
it meets its performance goals. The activities required for this purpose are summarized by the 
term lAPMl lAPMI activities are required regardless how well ISPEI activities during development 
outlined in the previous section have been executed. This is the case because either assumptions 
about the production environment or the workload can be wrong. Furthermore, performance 
data collected during operations provides a lot of insights for the development teams to get 
them from assumptions to knowledge. The kev I APMl activities during development are outlined 
in this section as follows: Section 5.1| outlines one of the most fundamental activities, namely 
performance monitoring. Afterwards, Section 5.2 covers performance problem detection and 


diagnosis activities based on data collected using monitoring techniques. In order to reduce the 


need for manual interaction, the section concludes in Section 5.3 with existing approaches and 
challenges regarding the application of performance models to control the performance behavior 
of an lEAl autonomously. 


5.1 Performance Monitoring 

use the following five dimensions of lAPMI functionality to assess 
the commercial market of I APMl tools in their yearly report: a) end-user experience monitoring 
()EUMI) . b) application topology discovery and visualization, c) user-defined transaction profiling, 
d) application component deep dive, and e) IT operations analytics pTOAIl . 

The lEIJMI dimension stresses the need to include the monitoring of client-side performance 
measures—particularly end-to-end response times—instead of looking at only those performance 
measures obtained inside the server boundaries. Major reasons for this requirement are that, 
nowadays, a) lEAb move considerable parts of the request processing to the client, e.g., rendering 
of lUIt j in web browsers on various types of devices; and that b) networks are increasingly influenc¬ 
ing the user-perceived performance because lEAb are accessed via different types of connections 
(particularly mobile) and from locations all over the world. lEUMI is usually achieved by adding 
instrumentation to the scripts executed on the client side and sending back the performance 
measurements as part of subsequent interactions with the server. 

Application topology discovery and visualization comprises the ability to automatically de¬ 
tect and present information about the components and relationships of lEAl landscapes as well 
as to make this information analyzable. An lEAl landscape consists of different types of physical 


Kowall and Cappelli (2014 
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or virtual servers hosting software components that interact with each other and with third- 
party services via different integration technologies, including synchronous remote calls and 
asynchronous message passing. Application topologies are usually discovered by monitoring 
agents deployed on the application servers, which send the obtained data to a central moni¬ 
toring database. Topologies are usually represented and visualized as navigable graph-based 
representations depicting the system components and control flow. These graphs are enriched 
by aggregated performance data such as calling frequencies, latencies, response times, and suc¬ 
cess rates of transactions, as well as utilization of hardware resources. This information enables 
operators to get an overall picture of a system’s health state, particularly with respect to its 
performance, and to detect and diagnose performance problems. 

User-dehned transaction profiling refers to the functionality of mapping implementation- 
level details about executed transactions (e.g., involved classes and methods, as well as their 
performance properties) to their corresponding business transactions. This feature is useful in 
order to assess the impact of performance properties and problems to business indicators. For 
example, it can be evaluated which business functions, such as order processing, are affected in 
case certain software components are slow or even unavailable. 

Application component deep dive refers to the detailed tracing and presentation of call trees 
for transactions. The call trees include control flow information, including executed software 
methods, remote calls to third-party services, and exceptions that were thrown. The call tree 
structure is enriched by performance measurements, such as response times and execution times, 
as well as resource demands. 

Orthogonal to the previous dimensions, IITOAI functionality aims to derive higher-order in¬ 
formation from the data gathered by the first four dimensions, e.g., by employing statistical 
analysis techniques, including data mining. A typical example for IITO Al ls performance problem 


detection and diagnosis as presented in Section 5.2 


The following list includes a summary of selected challenges regarding the current state of 
performance monitoring with a focus on the lAPMI tooling infrastructure: 


• The most mature lAPMI tools are closed-source software products provided by commercial 
vendors. These tools provide comprehensive support for monitoring heterogeneous lEAl 
landscapes, covering instrumentation support for various technologies and including novel 
features such as adaptive instrumentation and automatic tuning for reduced performance 
overhead. However, being closed source, the tools’ functionality often cannot be extended 
or reused for other purposes in external tools. Also, details about the functionality are 
generally not published. Moreover, researchers are usually not allowed to evaluate or 
compare their research results with the capabilities of the tools as this is not permitted 
according to the license agreements. 


The data collection functionalities of lAPMI tools—including data about lEAl topologies, 
transaction traces, and performance measures—are a valuable input for performance model 
extraction approaches, as presented [Section O However, all too often, no dehned inter¬ 
faces exist to access this data from the tools. In order to increase the interoperability of 
lAPMI platforms, open or even common interfaces are desirable. In this way, researchers can 
particularly contribute to the effectiveness of lAPMl solutions by developing novel (model- 
based) IITOAI approaches that build on the data collected by the mature lAPMI tools. 


• The configuration of lAPMI tools, due to the system-specificity, is very complex, time- 
consuming, and error prone—particularly given the aforementioned faster release and de¬ 
ployment cycles requiring a continuous refinement. Eor example, lAPMI soecihc questions 
such as what and where to monitor are decisions mainly taken by operations teams. How¬ 
ever, performance models used during design time could be further exploited and extended 
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to specify operations-related monitoring aspects earlier and more systematically in the sys¬ 
tem lifecycle. An automatic configuration of lAPMl tools is desirable, e.g., based on higher 
level and tool-agnostic descriptions of monitoring goals attached to architectural perfor¬ 
mance models. 


5.2 Problem Detection and Diagnosis 

Performance problem detection aims to reveal symptoms for present or upcoming system states 
with degraded or suspicious performance properties, unusually high or low response times, uti¬ 
lization of resources, or number of errors. The diagnosis step aims to reveal the root cause of 
the performance problems observed by the previously detected symptom(s). These steps are 
comparable to the activities during anti-pattern detection outlined in [Section 4.3[ However, 
their main difference is that problems revealed by anti-pattern detection approaches are lim¬ 
ited to scenarios that are known to cause problems, whereas general problem detection tries to 
reveal problematic situations without prior knowledge. Approaches for problem detection and 
diagnosis can be classihed based on various dimensions. In this section, we focus on when the 
analysis is conducted (before or after the problem occurs), who is performing it (a human or a 
machine), and how it is performed (based on information about the system state or individual 
transactions). Note that we are not aiming for a complete taxonomy and/or classification of 
approaches but we give examples for how performance problem detection and diagnosis can be 
conducted and what example approaches are. 


When? Reactive approaches aim to detect and diagnose problems after they occurred, 
using statistical techniques like detection of threshold violations or deviations from previously 
observed baselines. Proactive approaches aim to detect and/or diagnose problems before they 


occur (Salfner et ah, 2010), using forecasts/predictions based on historic data for performance 


measures of the same system. Forecasting and prediction techniques include mature statistical 
techniques like time series forecasting, machine learning, as well as a combination of these tech¬ 
niques with model-based performance prediction incorporating architectural knowledge about 
the architecture (e.g., as in the approach by Rathfelder et al. (2012[)). 


Who? Problem detection is usually achieved by setting and controlling baselines on per¬ 
formance measures of interest, both of which may be conducted manually or automatically. 


Another approach is anomaly detection (Chandola et al. 2009 Marwede et ah, 2009; Ehlers 


et ah, 2011), which aims to detect patterns in the runtime behavior that deviate from previously 


observed behavior. If no automatic problem detection is in place, problems will be often reported 
by end users or by system operators that inspect the current health state of the system. Manual 
diagnosis of problems is usually performed by inspecting monitored data or by reproducing and 
analyzing the problem in a controlled development environment, using tools like debuggers and 
profilers. In any case, expert knowledge about typical relationships between symptom and root 
cause can be used to guide the diagnosis strategy (e.g., based on the performance anti-patterns 


presented in Section 4.3). 


How? State-based problem detection and diagnosis approaches reason about aggregate 
behavior of system or component measures obtained from a certain observation period, e.g., 
response time percentiles or distributions, and total invocation counts. Note that data from 
individual transactions (e.g., individual response times and control flow) may be used for the 
aggregation and model generation, but are dropped after the aggregation step (e.g., as in the 


approach of Agarwal et ah (2004)). Transaction-based approaches (e.g., the approach of Kiciman 


and Fox (2005)) are usually triggered by symptoms of performance problems observed for (a 
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class of) transactions, e.g., high response times for a certain business transaction or error return 
codes. For diagnosis, the transaction’s call tree is inspected similar to the process of profiling, 
e.g., looking for methods with (exceptionally) high response times, high frequencies of method 
invocations, or exceptions thrown. 


The following list includes a summary of selected limitations of current approaches and 
research challenges: 

• Performance requirements, e.g.. ISLAt for response times, are often barely defined in prac¬ 
tice. This makes the intuitive approach of comparing (current or predicted) performance 
measures with thresholds or baselines infeasible. Approaches are needed to automatically 
derive meaningful baselines from historic measurements. Note that a feasible trade-off 
of classical classification quality attributes such as false/true positives/negatives needs to 
be achieved to let administrators trust the performance problem detection and diagnosis 
solution. 

• Faster development and deployment cycles, in addition to potentially multiple components 
deployed in different versions at the same time, impose challenges to configurations of 
problem detection and diagnosis approaches. 

• Basic problem detection support, often based on automatically determined baselines, is 
provided by some commercial lAPMI tools. However, for researchers, it is often hard to 
judge the underlying concepts, because the tools are closed source (exceptions exist) and 
the concepts are protected by patents or not published at all. In order to improve the 
comparability and interoperability of tools, open lAPMI platforms are desirable. 

• Basic automatic problem detection is accepted by administrators, given that the false/true 
positive/negative rates are in an acceptable range. As fully automatic problem resolution 
is usually not accepted, future work in the performance community could be to focus more 
on recommender systems for problem diagnosis and resolution, which can be based on 
expert knowledge. 


5.3 Models at Runtime 


Self-adaptive or autonomic software systems use models at runtime (Salehie and Tahvildari 


2009), which continuously perform activities in a control loop, as follows: a) updating the model 
with monitoring data by integrating with appropriate monitoring facilities; b) learning, tuning, 
and adjusting model parameters by adopting appropriate self-learning techniques; c) employ¬ 
ing the model to reason about adaptation, scaling, reconfiguration, repair, and other change 
decisions. Several frameworks have been developed to implement runtime engines for these ac¬ 


tivities. Recent examples include EUREMA (Vogel and Giese, 2014), iObserve (Heinrich et al. 


Silva Souza, 2013). 


2014 

), MORISIA 

Vogel et al. 

2011 

), SLAstic (van Hoorn 

2014) 


Both development and operations can benefit from models at runtime in the context of Dev- 
Ops, as illustrated in Eigure 5.1 Initially, performance-augmented models and implementation 
artifacts are deployed to operations. The parameterized models at runtime are continuously 
updated and become more accurate by receiving runtime monitoring data. As mentioned above, 
the performance models may serve as a basis to dynamically control a system’s performance 
during operation. For instance, the npdated models can be used for runtime capacity manage¬ 


ment (Section 6.1) helping the operations team to decide about the resources for the system or 


providing feedback for the auto-scaling mechanism. Models at runtime can have several purposes 
for operations teams. They can be exploited as the source for monitoring aspects of a running 
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Figure 5.1: Models at runtime in DevOps 


system, to affect the system via model manipulation, and as a basis for analytical methods, 
such as model-based verification and model-based simulation. A promising role of models at 
runtime for development in the context of DevOps is to bring back the adapted model along 
with the associated runtime performance information to the development environment. The 
gathered information can be used to analyze and improve software system performance based on 
refined design-time artifacts such as software architectural descriptions, employing the model- 
based evaluation techniques summarized in this report. Using the updated models, developers 
can then update the system design and the underlying code or it can be automatically updated 


by appropriate techniques that causally connect the models and the system (Chauvel et al. 


2013) in order to improve system performance. Note that all major components in Figure 5.1 


reside in both IDevI and Ops since the system itself has both design-time and runtime aspects 
and this is the same for models and the development and operational team. 


Selected challenges regarding models at runtime in the context of DevOps include: 

• Monitoring sensors are not precise and contain noise. As a result, the parameters of the 
models that are required to be estimated by such monitoring data also get influenced by 
such measurement inaccuracies. One of the relevant challenges in such context is to develop 
reliable and robust estimation techniques that can update model parameters accurately 
given such inaccuracies in measurements. 

• Monitoring data is collected on an implementation level which might deviate from the 
model level in various degrees. For example, a monitoring record may contain the sig¬ 
nature of the invoked operation, class, and object which must then be mapped to the 
corresponding component instance or type on model level. Such mapping becomes even 
more complicated when non-structural properties are observed. In many approaches this 
mapping is performed by a function that evaluates the signatures and maps them based 
on an algorithm to a model constructed at runtime. However, in context of pre-existing 
design-time models, such automatically derived runtime models may not correspond to 
their design-time counterparts. Therefore, design-time models in conjunction with a map¬ 
ping between code and model must be available at runtime to provide a correspondence of 
the monitoring data to model elements. 

• To be able to feedback knowledge gathered during monitoring, either the runtime model 
must use a common meta-model with the design-time models or provide an accurate map¬ 
ping between both models. In case of a common subset it is important to understand 
the information flow from runtime to design-time and vice versa. For example, a changed 
deployment at runtime must be reflected in the model. 
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• Runtime models, like user behavior models, are constructed out of monitoring data which 
are then used to predict future user and system behavior. However, certain events, such 
as a sales event for a web-shop application, cannot be predicted correctly out of observed 
data. Therefore, it must be possible to feed in new user behaviors at runtime without 
affecting the predicted behavior based on observed behavior. 


6 Evolution: Going Back-and-Forth between Development and 
Operations 

After a system has been initially designed, implemented, and deployed, the evolution phase of 
software development starts. As phrased by Lehman’s first law of software evolution, a system 
“must be continually adapted or it becomes progressively less satisfactory in use” (Lehman and 


Ramil, 2001), and thus evolution and change are inevitable for a successful software system. In 


the evolution phase, development and operations can be intertwined as shown in Figure 1.1 


While the system is operated continuously, development activities after initial deployment are 
triggered by specific change events. With respect to performance, two types of triggers are most 
relevant: 


• Changing requirements: Changed functional requirements need to be incorporated in the 
software architecture, software design, and implementation. Such new or changed func¬ 
tional requirements can newly arise in the form of new feature requests. Alternatively, 
these new requirements can already be on the release plan for some time before they get 
tackled in a next development iteration. In addition to functional requirements, new or 
changed quality requirements can also trigger development. Examples include require¬ 
ments for better response times as the users expectations have become higher over time 
(as also discussed by Lehman and Ramil (2001)). 

• Changing environment: In addition to changing requirements, also changes of the environ¬ 
ment may create a need to update a system’s design. For performance, the most important 
type of environmental change is a changing workload, both in terms of changing number 
of users and in terms of changing usage profile per user. A second common type of change 
concerns the execution environment, such as migration from on-premise to cloud. Such 
environmental changes, if not properly addressed, can lead to either violating performance 
requirements (in case of increasing workload) or inefficient operation of the system (in case 
of decreasing workload). In addition to workload and execution environment, changes in 
the quality properties of services that the ISUAI depends on can likewise cause the perfor¬ 
mance of the ISUAl to change. 


Lehman and Ramil (2001 


As a basis for performance-relevant decision making in the evolution phase, runtime information 
from the system’s production environment can be exploited using model-based performance 
evaluation techniques. As detailed in previous sections, this includes information about the 
system structure and behavior, as well as workload characteristics. This section focuses on two 
specific performance engineering activities within the evolution phase, namely capacity planning 
and management (Section 6.1), as well as software architecture optimization for performance 
(Section 6.2). 


6.1 Capacity Planning and Management 

Whenever an lEAl is being moved from development to operations, it is necessary to estimate 
the required capacity (i.e., the amount and type of software and hardware resources) for given 
workload scenarios and performance requirements. This is extremely important for completely 
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new deployments, but it is also required whenever major changes in the feature set or workload 
of an application are expected. In case a new deployment needs to be planned, this activity is 
usually referred to as capacity planning. As soon as a deployment exists and its capacity needs 
to be adapted for changing workloads or performance requirements, this activity is referred to 
as capacity management. 

In order to plan capacity, it is important to not only consider performance goals but other per¬ 
spectives such as costs. The latter includes investments in hardware infrastructure and software 
licenses, as well as operations expenditures, e.g., for system maintenance and energy. According 


to Menasce and Almeida (2002), capacity is considered adequately, if the performance goals 


are met within given cost constraints (initial and long term cost), and if the proposed deploy¬ 
ment topology fits within the technology standards of a corporation. Therefore, estimating the 
required capacity for a deployment requires the creation of a workload model, a performance 
model, and a cost model. 

Nowadays, a key challenge that leads to the importance of capacity management for existing 
deployments is that it is often practically not feasible to evaluate the performance of all deploy¬ 


ments of an lEAl during development as shown in Section 4 Therefore, operations teams cannot 
expect that all their specihc scenarios have been evaluated. A challenge for new deployments 
is, that not all deployments are known at the time of a release. Furthermore, there is often a 
lack of information (e.g., about the resource demands of an application for specific transactions) 
whenever a new deployment needs to be planned. 

The traditional way of approaching capacity planning and management activities is to setup 
a test environment, execute performance tests, and use the test results as input for capacity 


estimations (King, 2004). As the test environments for such tests need to be comparable to the 


hnal production deployments, this approach is associated with a lot of cost and manual labor. 
Therefore, model-based performance evaluation techniques are proposed in research results in 


order to reduce these upfront investment costs (Menasce and Almeida, 2002). 

One example for a model-based capacity planning tool is proposed by |Liu et al. (2004). This 
tool can be used to support capacity planning for business process integration middleware. A 


similar tool for component- and web service-based applications is proposed by Zhu et al. (2007). 


However, their tool is intended to be used to derive capacity estimations from designs and 
these estimations cannot be used for final capacity planning purposes. Brunnert et al. (2014b) 


use resource profiles that serve as an information sharing entity between the different parties 
involved in the capacity planning process. Resource profiles can be complemented with workload 
and hardware environment models to derive performance predictions. 

As all of the aforementioned approaches require a manual interaction to configure a model in 
a way that performance, cost, and technological constrains are met, an automated optimization 
is proposed by Li et al.l (2010). The authors propose an automated sizing methodology for 


Enterprise Resource Planning systems that takes hardware and energy costs into account. This 
methodology tries to automatically find a deployment topology which provides adequate capacity 
for the lowest total cost of ownership. 

In addition to the previously mentioned capacity planning and management activities usually 
performed offline with a longer time horizon, a lot of work is currently done in the area of self¬ 


adaptation and runtime resource management (Kounev et al., 2011; van Hoorn, 2014). However 
it needs to be emphasized that 


architectures need to be specifically designed to handle 
dynamically (de-)allocated resources during runtime. Therefore, additional research is going on 


in the area of dynamically scalable (often called elastic) software architectures (Brataas et al. 


2013). 


Selected challenges in the area of capacity planning and management include the following: 


Descriptions of the resource demand for lEAb are still too limited in their capabilities (e.g., 
the amount of resource types they cover). 
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New system architectures such as big data systems require a refocus on storage performance 
and algorithmic complexity. 

Energy consumption should be considered as part of a capacity planning activity, as it is 


a major cost driver in data centers nowadays (Poess and Nambiar, 2008). 


The use of existing capacity planning and management approaches needs to be simplified 
to avoid the need to have highly-skilled performance engineers on board at the time of 
planning a deployment as this is often necessary in a sales process without a project team 


(Grinshpan, 2012). 


6.2 Software Architecture Optimization for Performance 

Whenever a change triggered a (re)design phase, there is also an opportunity to question the 
current architecture model and find potential for improvement. Architecture-based performance 
prediction is not limited to design decisions that are directly affected by incoming changes. Addi¬ 
tionally, other design decisions taken can be reconsidered in the light of the changed requirements 
and/or changed environment. As a key aspect of the overall DevOps culture is automation, the 
remainder of this section will discuss existing tools that can help to automatically achieve such 
improvements. 

A number of approaches that automatically derive performance-optimized software architec¬ 


tures have been surveyed by Aleti et al. (2013). Up to 2011, the authors found 51 approaches 
that aim to optimize some performance property. These approaches usually focus on specihc 
types of changes. For example, 37 of the approaches studied allocation of components, 6 ap¬ 
proaches address component selection, and 20 approaches address service selection. Overall, 
the explored changes were allocation, hardware replication, hardware selection, software replica¬ 
tion, scheduling, component selection, service selection, software selection, service composition, 
software parameters, clustering, hardware parameters, architectural pattern, partitioning, or 
maintenance schedules. Some approaches also considered problem-specific additional changes 
and 5 approaches were general, i.e., they supported the modeling of any type of change. 

In addition to optimizing performance, most of the approaches also take potentially conflict¬ 
ing additional objectives into account. Most commonly, costs of the solution are considered as 
well (38 approaches), followed by reliability (25), availability (18), and energy (6). 

Performance models extracted from production systems as outlined in Section 5.3| can be used 
as a basis to optimize performance. For formulating an optimization problem on an architectural 
performance model, it is required to specify an objective function and a list of possible changes 
to be explored by the optimization algorithm. 


Objective Function The associated solvers of the performance model already provide possible 
evaluation functions. Such an evaluation function takes an architectural performance model as 
an input and determines performance metrics, e.g., the mean response time, as an output. 
Selecting a performance metric of interest, such as mean response time, we can easily define an 
objective function for an optimization problem. Let M denote the set of all valid architectural 
performance models for a given architecture metamodel (e.g., all valid instances of the Palladio 
Component Model). Let /p : M —)> M denote the evaluation function that determines the selected 
performance metric p for a given model m G M. Then, fp can serve as an objective function to 
optimize architecture models. In case of mean response time mrt, we aim to minimize fmrt- 


Possible Changes In addition to defining what the objective function is, the other major 
component of the optimization is defining what can be changed. When optimizing architectural 
models for performance, we usually want to keep the functionality of the system unchanged. 
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Thus, we do not want to arbitrarily change the model, but only explore functionally-equivalent 
models with varying performance properties. For example, in component-based architectures, 
the deployment of components to servers can change without changing the functionality of the 
system. Likewise, assigning a cache component or replacing middleware components should not 
change functionality. 

There are different approaches on how to encode possible changes. One is to enumerate the 
set of open design decisions (Koziolek 2013). For example, a decision could be on which server 
to allocate a component. Another example for an open decision could be whether to add a 
cache and where. Each open decision has a set of possible alternative options. For example, the 
possible options for the hrst design decision could be that a component can be allocated to a set 
of servers. As another example, the possible options for the second open design decision could 
be that the cache can be placed in front of a set of components or nowhere. 

Then, a single architectural candidate can be characterized by which option has been chosen 
for each open decision. One can picture this as a multidimensional decision space where each 
open decision is a dimension and each possible architectural candidate is a point in this space. In 
addition, it may be required to specify additional constraints on the decision space, such as that 
component A and component B may not be allocated to the same machine, e.g., due to security 
concerns. Thus, some of the architectural candidates in the decision space may be invalid. 


Optimization Problem Then, combined, we can define an optimization problem. A single¬ 
objective optimization could be to minimize the objective function fp for the chosen performance 
metric of interest p over the valid architectural candidates in the decision space. If multiple ob¬ 
jective functions are of interest, one can also formulate a multi-objective problem with several 
objective functions fp^, ..., fp^ and search for the so-called Pareto-optimal solutions (Deb, 2005). 
A solution is Pareto-optimal, if one cannot find another solution that is better or at least equal 
with respect to all objective functions. 


(Deb,2005 


Even though a lot of approaches exist to automatically improve a software architecture for 
performance and it is known how to specify a general optimization problem based on performance 
models, a few major challenges remain: 

• All the approaches that use performance models as input for a software architecture op¬ 
timization rely on the accuracy of the information represented in the model. Whenever a 
certain aspect of a software system is not represented, it cannot be optimized. It thus may 
be necessary to derive different model granularities for runtime optimization of systems 
and general architecture optimization. 


• Even though automatic performance model generation approaches exist, the specification 
of the possible changes to these models remains a manual step. It remains to be seen how 
the specification of these possibilities can be designed to make it simple for the users and 
thus increase the adoption rate. 


Most software architecture optimization approaches surveyed by Aleti et al. (2013) are 
limited in that they either focus on specific possible changes only, that they only support 
simple performance prediction (e.g., very simple queuing models), or that they consider 
no or few conflicting objectives. Thus, a general optimization framework for software 
architectures could be devised, which could make use of a) plug-ins that interpret different 
architecture models (from architecture description languages to component models) and 
provide degree of freedom definitions and b) plug-ins to evaluate quality attributes for a 
given architecture model. 
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Section 7. Conclusion 


7 Conclusion 


This report outlined activities assisting the performance-oriented DevOps integration with the 
help of measurement- and model-based performance evaluation techniques. The report explained 
performance management activities in the whole life cycle of a software system and presented 
corresponding tools and studies. Following a general section about existing measurement- and 
model-based performance evaluation techniques, the report focused on specihc activities in the 
development and operation phase. Afterwards, it outlined activities during the evaluation phase 
when a system is going back-and-forth between development and operation. 

A key success factor for all the integrated activities outlined in this report is the interop¬ 
erability between the different tools and techniques. For example, an architect might create a 
deployment architecture of a software system, conduct several studies, then move the model to 
a tool better suited to performance analysis. For models, approaches such as the Performance 
Model Interchange Format (jPMIFI) (Smith et al., 2010) exist to help in this process. When 


someone from operations might want to communicate metrics to someone from development, 
it is necessary to be able to exchange the metrics in a common format. For this use case, for¬ 


mats such as the Common Information Model (ICIMIl Metrics Model (Distributed Management 


Task Force (DMTF), Inc., 2003), the Structure Metrics Meta-Model (ISMMj) (Object Manage¬ 


ment Group, Inc. 2012) or the performance monitoring specification of the Open Services for 
Lifecycle Collaboration! (2014) exist. However, even though approaches exist, they need to be 
supported by multiple vendors in order for them to work. It is still to be seen which of these 
approaches and specifications might establish themselves as an industry standard. 

As of today the outlined approaches exist in theory and practice, but most model-based 
approaches are developed in academia and not in industry context. Furthermore, most develop¬ 
ment and operations activities are not tightly integrated as of today. Integrating the proposed 
approaches and increasing the degree of automation are key challenges for applying performance 
models in industry context and supporting IDevI and |Ops] in terms of performance improvements. 
Several approaches focus on specihc systems and need to be generalized for broader usage scenar¬ 
ios. Further validation on large industry projects would increase the level of trust and readiness 
to assume the costs and risks of applying performance models. Both, industry and academia 
have to address these challenges to enable a fully performance-oriented DevOps integration. 
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Section 8. Acronyms 


8 Acronyms 


APM 

Application Performance Management 

API 

Application Program Interface 

CBMG 

Customer Behavior Model Graph 

CD 

Continuous Delivery 

CDE 

Continuous Deployment 

Cl 

Continuous Integration 

CIM 

Common Information Model 

CPU 

Core Processing Unit 

CSM 

Core Scenario Model 

Dev 

development 

DLIM 

Descartes Load Intensity Model 

DML 

Descartes Modeling Language 

EA 

enterprise application 

EFSM 

Extended Finite State Machine 

EJB 

Enterprise Java Bean 

EUM 

end-user experience monitoring 

HDD 

hard disk drive 

Java EE 

Java Enterprise Edition 

JPA 

Java Persistence API 

IDE 

Integrated Development Environment 

I/O 

Input / Output 

IT 

Information Technology 

ITOA 

IT operations analytics 

LibReDE 

Library for Resource Demand Estimation 

LIMBO 

Load Intensity Modeling Tool 

LQN 

Layered Queueing Network 

MARTE 

lUMLIProhle for Modeline and Analvsis of Real-Time and Embedded Svstems 

PCM 

Palladio Component Model 

PMIF 

Performance Model Interchange Format 

QN 

Queueing Network 
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QPN 

Queueing Petri Net 

OSLC 

Open Services for Lifecycle Collaboration 

Ops 

operations 

SDL 

Specification and Description Language 

SLA 

Service-level Agreement 

SMM 

Structure Metrics Meta-Model 

SPE 

Software Performance Engineering 

SPL 

Stochastic Performance Logic 

SUA 

System Under Analysis 

Ul 

user interface 

UML 

Unified Modeling Language 


UML-SPT lUMD Profile for Schedulability, Performance and Time 
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